Numerical Optimization of Eigenvalues of Hermitian 

Matrix Functions 

Mustafa Kilig* Emre Mengi^ E. Alper Yildirim^ 
September 12, 2011 



Abstract 

The unordered eigenvalues of a Hermitian matrix function depending on one parameter 
analytically is analytic. Ordering these eigenvalues yields piece-wise analytic functions. 
For multi-variatc Hermitian matrix functions depending on d parameters analytically the 
ordered eigenvalues are piece-wise analytic along lines in the d-dimensional space. These 
classical results imply the boundedness of the second derivatives of the pieces defining the 
eigenvalue functions along any direction. We derive an algorithm based on the bound- 
edness of these second derivatives for the global minimization of an eigenvalue of an 
analytic Hermitian matrix function. The algorithm is globally convergent. It determines 
the globally minimal value of a piece-wise quadratic under-estimator for the eigenvalue 
function repeatedly. In the multi-variate case determining this globally minimal value 
is equivalent to the solution of a quadratic program. The derivatives of the eigenvalue 
functions arc used to construct quadratic models yielding rapid global convergence as 
compared to traditional global optimization algorithms. The applications that we have 
in mind include the Hao norm of a linear system, numerical radius, distance to uncon- 
trollability and distance to the nearest defective matrix. 
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1 Introduction 

The main object of this work is a matrix-vahied function A{uj) : M.'^ ^nxn ^^^^^ satisfies 
the fohowing two properties. 

(i) Analyticity : The function A{lu) has a convergent power series representation for all 

(ii) Self-adjointness : The function A{ll!) is Hermitian {i.e. A{uj) = A*{uj)) for all w e M'^. 

Here we consider the global minimization or maximization of a prescribed eigenvalue A(w) of 
A{uj) over uj € M'' numerically. From an application point of view prescribed typically refers 
to the jth largest eigenvalue, i.e., X{u!) := Xj{A{uj)). But it might as well refer to a particular 
eigenvalue with respect to a different criterion as long as some (piece-wise) analyticity prop- 
erties discussed below and in Section [2] hold. For instance, for a univariate Hermitian matrix 
function, X{io) might be anyone of the n roots of the characteristic polynomial of A{lu) that 
varies analytically with respect to w as described next. 

Let us start by considering the univariate case. The roots of the characteristic polynomial 
of A{oj) : M — > C"^" can be arranged as Xi{uj), . . . , A„(w) so that each Xj{oj) is analytic over 
M for j — 1, . . . , n. Remarkably this analyticity property of eigenvalues of Hermitian matrix 
functions holds even if some of the eigenvalues repeat, that is even if Xjiuj) = XkiiS) for ] ^ k. 
On the other hand this analyticity property of eigenvalues is intrinsic to Hermitian matrix 
functions only, and is not true for non-Hermitian matrix functions. For instance analytic 
perturbations of order 0(e) to the entries of an n x n Jordan block may yield variations in 
eigenvalues proportional to 0(e^/"). 

When the eigenvalues Ai(cj), . . . , A„(w) are ordered from largest to smallest, they are no 
longer analytic, but continuous and piece-wise analytic. What we exploit in this paper for 
the optimization of an eigenvalue X{uj) is the boundedness of the derivatives of the analytic 
pieces. Particularly there exists a constant 7 satisfying 

X'j(uj) < 7 Vcj e M 

for j — 1, . . . ,rt. The function X{io) is non-convex possibly with many local extrema, so its 
global optimization cannot be achieved solely based on derivatives. For global optimization 
one must also benefit from global properties such as the upper bound on the second derivatives 

Contrary to the univariate case, the roots of the characteristic polynomial of a multivariate 
Hermitian matrix function A{uj) : K'* — )■ C"^" are not analytic no matter how they are 
ordered. But along any line in M'^ there is an ordering that makes eigenvalues analytic, and 
ordering them from largest to smallest makes them continuous and piece-wise analytic. The 
(piece-wise) analyticity of eigenvalues along lines in M.'^ is our main tool for the optimization 
of an eigenvalue X{uj) in the multi-variate case. 

To the authors' knowledge the most elementary examples that require the optimization of 
eigenvalues of Hermitian matrix functions are the distance to instability 

inf{||Ayl||2 : x'{t) = (A + AA)x{t) is unstable} = inft^gR(T„(A - uil), 
inf{|| AAII2 : Xk+i = {A + AA)xk is unstable} = Mg^io,2-^) <^n{A - e^^l) 
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for continuous and discrete systems, respectively. Above ct„ denotes the smallest singular 



value. For instance to minimize cr„(A 
smallest eigenvalue of 



A* 



ujil) the associated eigenvalue function A(w) is the 





' ujil 



A 



Loil 




in absolute value, which is continuous and piece-wise analytic with respect to w. Some other 
examples include the numerical radius of a matrix, Hoo norm of a transfer function, distance 
to uncontroUability from a linear time-invariant dynamical system, distance to the nearest 
matrix with an eigenvalue of specified algebraic multiplicity (in particular distance to the 
nearest defective matrix) , distance to the nearest pencil with eigenvalues in a specified region. 
The eigenvalue optimization problems associated with these problems are listed in the table 
below. In some cases the supremum of an eigenvalue needs to be minimized. In these examples 
A(aj) is the supremum of the eigenvalue, instead of the eigenvalue itself. Note that if f{x,y) 
with X G K"*,?/ S K depends on each xj and y piece-wise analytically, jointly. Then g{x) ~ 
swpy f{x,y) is also a piece-wise analytic function of each Xj, jointly. Below D,'' denotes the 
r-tuples of C C, denotes the vector of ones of size r, the jth largest singular value is 
denoted by aj, the notation is reserved for the Kronecker product, and 



C(A,r) := 



Ai 712/ 
A2 



llrl 
l2rl 

A,. 



for A = [ Ai 



Ar ] e rj'' and r = [ 712 



7(.-i). ] e C'-('-i)/2. 



Problem 


Optimization Characterization 


Numerical Radius 


, /Ae"' + A*e-"'\ 
sup Ai 
ee[o,27r) V ^ / 


i/oo-Norm of a Linear Time-invariant 
System {A,B,C,D) 


sup ai{C{ujiI - A)-'B + D) 


Distance from a Linear Time-Invariant 
System (A, B) to UncontroUability 


inf a„([ A-zI B \) 


Distance from A to Nearest Matrix with 
an Eigenvalue of Multiplicity > r 


inf sup anr-r+l {I ^ A ~ C{Xer,T) I) 


Distance from A — XB to Nearest Pencil 
with r eigenvalues in region C C 


inf sup cr„r_,.+i (/«) A - C(A,r) (g) S) 

Aen^ PgCr(r-l)/2 



The distance to instability was first considered by Van Loan [34] in 1984, and used to 
analyze the transient behavior of the dynamical system x'{t) — Ax{t). Various algorithms 
have been suggested [S] Uni [ISl IH] since then for its numerical computation. Specifically 
Byers' work j8j inspired many other algorithms, each of which is tailored for a particular 
eigenvalue optimization problem. In control theory it is essential to compute the iJoo norm 
of the transfer function of a dynamical system for various purposes, e.g., controller synthesis. 
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model reduction, etc. In two independent works Boyd and Balakrishnan fS', and Bruinsma 
and Steinbuch |B] extended the Byers' algorithm for the computation of the iJoo norm of the 
transfer function of a linear dynamical system. The distance to uncontroUability was originally 
introduced by Paige [28j . and the eigenvalue optimization characterization was provided in 
|12j . Various algorithms appeared in the literature for the computation of the distance to 
uncontroUability; see for instance [HI HSl [lEl HZ] • Malyshev derived an eigenvalue optimization 
characterization for the distance to the nearest defective matrix |25) , but did not elaborate on 
how to solve it numerically. The second author generalized the Malyshev's characterization 
for the distance to the nearest matrix with an eigenvalue of specified multiplicity [26]. More 
recently the eigenvalue optimization characterization with many unknowns is deduced for the 
distance to a nearest pencil with eigenvalues lying in a specified region [23] ; [13 [55] are special 
cases of [ISj. This last problem has an application in signal processing, namely estimating 
the shape of a region given the moments over the region as suggested by Elad, Milanfar and 
Golub ,13 . 

All of the aforementioned algorithms are devised for particular problems. The algorithm 
that we introduce and analyze here is the first generic algorithm for the optimization of 
eigenvalues of Hermitian matrix functions depending on its parameters analytically. The 
algorithm is based on piece-wise quadratic models lying underneath the eigenvalue function. 
Consequently, the algorithm here is a reminiscent of the Piyavskii-Shubert algorithm [29l [33] , 
that is well-known by the global optimization community and based on constructing piece- wise 
linear approximations for a Lipschitz continuous function lying underneath the function. The 
Piyavskii-Shubert algorithm is derivative-free, and later sophisticated variants, attempting to 
estimate the Lipschitz constant locally, appeared in the literature [5T1 132] • The algorithm here 
exploits the derivatives, and the use of quadratic under-estimators yields faster convergence. It 
is applicable for the optimization of other (piece-wise) analytic functions. But it is particularly 
well-suited for the optimization of eigenvalues. For an eigenvalue function, once the eigenvalue 
and the associated eigenvector are evaluated, its derivative is available without any other 
significant work due to analytic formulas; see the next section, in particular equation ([3]). 

The outline of this paper is as follows. In the next section we review the basic results 
concerning the analyticity of the eigenvalues of a Hermitian matrix function A{uj) that depends 
on u! analytically. These basic results are in essence due to Rellich [3D] . Also in the next section 
we derive expressions for the first two derivatives of the unordered eigenvalue A(w). To our 
knowledge these expressions first appeared in a Numerische Mathematik paper by Lancaster 
[23]. Section |3] is devoted to the derivation of the one-dimensional algorithm. Section |4] 
extends the algorithm to the multi-variate case. Section [5] focuses on the analysis of the 
multi-dimensional algorithm; specifically it establishes that there are subsequences of the 
sequence generated by the algorithm that converge to global minimizers. Section [6] describes 
a practical variant of the multi-dimensional algorithm, that is based on mesh-refinement. The 
algorithm suggested here and its convergence analysis applies to the global optimization of 
any continuous and piece- wise analytic function. Finally Section [7| discusses how in particular 
the algorithm can be applied to the eigenvalue optimization problems listed above. 
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2 Background on Perturbation Theory of Eigenvalues 

In this section we first briefly summarize the analyticity resuhs, mostly borrowed from |30[ 
Chapter 1], related to the eigenvalues of matrix functions. Then expressions are derived 
for the derivatives of Hermitian eigenvalues in terms of eigenvectors and the derivatives of 
matrix functions. Finally, we elaborate on the analyticity of singular value problems as special 
Hermitian eigenvalue problems. 



2.1 Analyticity of Eigenvalues 
2.1.1 Univariate Matrix Functions 

For a univariate matrix function A{uj), Hermitian or not, but depending on ui analytically, 
the characteristic polynomial is of the form 

g{uj, A) := det(A/ - A{uj)) = a„(a;)A" H h ai(a;)A + ao{uj) 

where ao(a;), . . . ,a„(a;) are analytic functions of uj. It follows from the Puiseux' theorem (see 
for instance [351 Chapter 2]) that each root Xj{uj) such that 5(0;, Aj(a;)) = has a Puiseux 
series of the form 

CO 

A,M = ^c,,,c.'=/- (1) 

k=0 

for all small w where r is the multiplicity of the root Xj{0). 

Now suppose A{uj) is Hermitian for all u, and let £ be the smallest integer such that 
Ci j 7^ 0. Then we have 

lim ^>lllMO)=e., 
implying Ci,j is real, since for all uj and w^^'" are real numbers. Furthermore 

Aj-(a;)-Aj(0) _ Q,^ 

is real implying (—1)^/'' is real, equivalently £/r is integer. This shows that the first nonzero 
term in the Puiseux series of Aj(w) is an integer power of uj. The same argument applied to 
the derivatives of Aj (w) and the associated Puiseux series indicates that only integer powers 
of UJ can appear in the Puiseux series ([T]), that is the Puiseux series reduces to a power series. 
This establishes that Xj{uj) is an analytic function of uj. Indeed it can also be deduced that 
associated with Aj(w) there is a unit eigenvector Vj{uj) that varies analytically with respect 
to uj (see [3D] for details). 

Theorem 2.1 (Rellich). Let A{uj) : R —?' C"^" be a Hermitian matrix function that depends 
on UJ analytically. 

(i) The n roots of the characteristic polynomial of A{uj) can be arranged so that each root 
Xj{uj) for j — I, ... ,n is an analytic function of uj. 

(ii) There exists an eigenvector Vj{uj) associated with Xj{uj) for j = 1, . . . ,n satisfying 
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(1) (\j{Lu)I - A{uj)j Vjicu) = e M, 

(2) \\vj{lu)\\2 = 1 Vcj e 

(3) v*{Lu)vk{i-tj) = Vo; G M for k ^ j, and 

(4) Vj{uj) is an analytic function ofuj. 

For non- Hermitian matrix functions, since the eigenvalue Aj(w) is not real, the argument 
above fails. In this case in the Puiseux series ([T]) non-integer rational powers remain in general. 
For instance the roots of the n x n Jordan block 

"0 1 ... " 
1 ... 

'■. 

1 

_ e _ 

with the lower left-most entry perturbed by e are given by £■ 



2.1.2 Multivariate Matrix Functions 

The eigenvalues of a multivariate matrix function A{oj) : M"^ — > C"^" depending on ut analyt- 
ically do not have power series representations in general even when A{uj) is Hermitian. As 
an example the unordered eigenvalues of 



UJi-\-UJ2 

2 



2 
1^2 



are Ai,2(a;) = ^^^^^ ± ^/^^^. On the other hand it follows from Theorem [Zl] that along 
any line in M'' the unordered eigenvalues Xj{uj), j — 1, . . . ,n oi A{ui) are analytic when A{uj) 
is Hermitian. This analyticity property along lines in M'' implies the existence of the partial 
derivatives of Xj{io) everywhere. Expressions for the partial derivatives will be derived in the 
next subsection indicating their continuity. As a consequence of the continuity of the partial 
derivatives, each unordered eigenvalue Xj{uj) must be differentiable. 

Theorem 2.2. Let A{uj) : M.'^ — S- C"^" be a Hermitian matrix function that depends on uj 
analytically. Then the n roots of the characteristic polynomial of A{uj) can be arranged so 
that each root Aj(aj) is (i) analytic on every line in , and (ii) differentiable on M.'^ . 



2.2 Derivatives of Eigenvalues and Eigenvectors 
2.2.1 First Derivatives of Eigenvalues 

Consider a univariate Hermitian matrix-valued function A{uj) depending on w analytically. 
An unordered eigenvalue Xj{ui) and the associated eigenvector Vj{uj) as described in Theorem 
12.11 satisfies 

Aiuj)vj{u}) = ~Xj{uj)vj{uj). 
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Taking the derivatives of both sides we obtain 

dA(ui) , , ., .dv.(uj) dXj(uj) , , ~ , .dvj(uj) 

Multiplying both sides by Vj{uj)* and using the identities Vj{uj)* A{uj) = Vj(uj)* Xj{uj) as well 
as Vj{Lu)*Vj{u!) ~ \\vj{^^)\\2 = 1 yield 



2.2.2 First Derivatives of Eigenvectors 

It is also possible to deduce the first derivative of Vj [lS) from ([2]). First the identity Vj {uj)*Vj (w) 
1 for all Lu implies 



d{vj{uj)*Vj{uj)) ^ ^ ^ ^^^^^dv^iuj)^ 
du) du) 



that is Vj{u}) and its derivative are orthogonal at all uj. Now by rearranging equation ^ and 
multiplying both sides by {Xj{uj)I — A{uj)y , the Moore-Pcnrose pseudoinverse of {Xj{uj)I — 
A{uj)), we have 

duj duj duj 

Suppose that the multiplicity of Xj{Lj) is one. Then 

(A,-H/-^H)t=^ ^ vu{Lo)v^{^r (5) 

^ Xj{u) - Afe(a;) 

and 

(X,{uj)I ~ A{uo))\X,{uj)I - A(uj)) = 

Consequently, 

(A,(cj)/ - A{uj))^v,{u:) = and (Aj(w)/ - A{uj))\Xj{uj)I - ^ , 

dio duj 

where the last equality is due to the orthogonality of to Vj{uj) meaning e 

span {vk{uj) : k — 1, . . . ,n and k =/= j}. Equation Q simplifies as 



^ = (A,M/-^M)t-A- 



(A,M/-^M)t^^;,.(a;). (6) 



Now suppose that the multiplicity of Xj[u)) is greater than one at a given toi € M. Indeed 
suppose that 

Xj{uJi) = Xa,{0Ji) = • • • = Xa,._i{uJe). 

We define a Hermitian matrix function C{uj) that has the same eigenvalues as A{uj) at uji, 
but the multiplicity of Xj{Lu) is one at all lo =/= uje close to iog. If the eigenvalues of A{uj) 
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already satisfies this property, let C{uj) := A{uj). Otherwise the eigenvalues of A{u}) satisfies 
Aj(a;) = Aaj.(w) for all ui for /c = l,...,r — 1. In this case consider any Hermitian matrix 
function C{uj) with exactly the same eigenvectors and eigenvalues as A{ijj) for all lo, b ut with 



2.1 



induces 



eigenvalue Xa^ (w) replaced by Aq,^ (cj) + k{uj — cu^)'^ for fc = 1, . . . , r — 1. Theorem 
analytic eigenvalue decompositions for A{uj) and C{uj) of the form V{uj)A{uj)V{uj)* where 
V{uj) is unitary and analytic, and A{Ld) is diagonal and analytic. It is apparent from these 
analytic eigenvalue decompositions that 

^M=CM and (7) 
aui auj 

Since the eigenvectors of C(aj) and .4(a;) are identical, their derivatives are equal. Furthermore 
Xj{uji) is an eigenvalue of C{uj£) of multiplicity r, but at all lu ^ uji close to lji the eigenvalue 
Xj{u!) of C{uj) has multiplicity one. Both the left-hand and right-hand sides of equation ^ 
are continuous. Consequently, applying the expression in Q to C{uj) at oj ^ ujg close to ujg 
and using the continuity as well as ([t]) yields 

= iX^iu:,)I-A{u;,)y^v,iu;,). 

This establishes that the expression given in ^ is valid even when the multiplicity of Aj(a;) 
is greater than one. 

2.2.3 Second Derivatives of Eigenvalues 

Finally by differentiating both sides of (|3| we obtain 

By plugging in the expression for the derivative of the eigenvector from (|6| we have 
or by using ([5| for the pseudoinverse 



Vk{uj)*— Vj{uj) 

duj 



(8) 



2.2.4 Derivatives of Eigenvalues for Multivariate Hermitian Matrix Functions 

Let A{uj) : M'' C"""" be Hermitian and analytic. It follows from ^ that 

dXJcu) .dA{uj) 
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Since A{uj) and Vj^uj) are analytic, this implies the continuity (indeed analyticity) of the 
partial derivatives, meaning Aj(w) is differentiable. As a consequence of the analyticity of 
the partial derivatives, all second partial derivatives exist everywhere. Differentiating both 
sides of (|9| with respect to a;^ would yield the following expressions for the second partial 
derivatives. 



dujk duji 



Vj{uj)+v*Auj) 



dA{uj) dvj{uj) ( dvj{uj)\* dA{uj) 



duk dujg 



duif 



dujk 



By eliminating the partial derivatives of the eigenvectors the expression above further simpli- 
fies as 



duJk duji 



tMM, 



or by further eliminating the pseudoinverses we have 



dujk duji 



2.3 Analyticity of Singular Values 

Some of the applications (see Section [7| concern the optimization of the jth largest singular 
value of an analytic matrix function. The singular value problems are special Hermitian 
eigenvalue problems. In particular denote the jth largest singular value of an analytic matrix 
function B{uj) : — ^ j^nxm^ necessarily Hermitian, by <7j{uj). Then the set of eigenvalues 
of the Hermitian matrix function 



Aiio) 



B{uj) 
B{u})* 



is {crj(w), 



j = 1, . . . ,n}. In the univariate case cFj{uj) is the jth largest of the 2n 



unordered analytic eigenvalues, Ai(w), . . . , A2n(w), of A{uj). The multivariate d dimensional 
case is similar, with the exception that each unordered eigenvalue Aj(w) is differentiable and 
analytic along every line in M.'^. Let us focus on the univariate case throughout the rest of 
this subsection. Extensions to the multi-variate case are similar to the previous subsections. 



Suppose Vj{uj) := 



Wj{uj) 



with Uj{(jj),Wj{uj) € C" is the analytic eigenvector function as 



specified in Theorem 2.1 of A{uj) associated with Aj(a;), that is 



B{uj) 
B{uj)* 

The above equation implies 



Aj(w) 



B{uj)wj{ijj) — Xj{uj)uj{uj) and B{uj)*Uj{uj) ~ Xj{u!)wj{uj). 



(10) 
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In other words Uj(a;), 'Wj{uj) are analytic, and consists of a pair of consistent left and right 
singular vectors associated with Xjiu). To summarize in the univariate case Aj(a;) can be 
considered as an unsigned and unordered singular value of B{u))^ and there is a consistent 
pair of analytic left Uj{uj) and right Wj{uj) singular vector functions. 

Now in the univariate case we derive expressions for the first derivative of Xj (uj) , in terms 
of the cor responding left and right singular vectors. It follows from the singular value equa- 
tions (10) above that = ||wj(a;)|| = 1/a/2 (if Xj{uj) — this equality follows from 
analytic! ty). Now the application of expression ^ yields 



did 



dB{uj)/duj 
dB{u)*/duj 



Uj{uj) 
Wj{uj) 



In terms of the unit left Uj{uj) := \/2 • Uj[uj) and right Wj{uj) := \/2 ■ Wj{uj) singular vectors 
associated with Aj(a;) we obtain 



Real 



UJUJ) ; Wi(UJ) 

doj 



(11) 



3 One-Dimensional Algorithm 

We suppose /i, . . . , /„ : M — ^ M are analytic functions. The function / : M — > M is a continuous 
and piece- wise function defined in terms of /i ,...,/„ . Here we describe an algorithm for 
finding a global minimizer of /. Certainly the ordered eigenvalues fit into this framework, 
i.e., the jthe largest eigenvalue Xj is a continuous and piece-wise analytic function defined in 
terms of the unordered eigenvalues Ai, . . . , A„. 

Any pair of functions among /i , . . . , /„ can intersect each other only at finitely many 
isolated points on a finite interval, or otherwise they are identical due to analyticity. The 
finitely many isolated intersection points are the only points where the piece-wise function / 
is possibly not analytic. Clearly / is continuous but may not be differentiable at these points. 

The algorithm is based on a quadratic model qk{x) about a given point G M that 
lies underneath f{x) at all a; e M. For any point x > Xk denote the points where / is not 
differentiable on [2:^,0;] by x^-^^ . . . ,x'^"^\ Then 

where x^^'' Xk and := x. Suppose that 7 is a global upper bound on the second 

derivatives, that is 

|/j'(x)|<7 VxeR 
for J = 1, . . . , n. For some 77 G {xk, t) we have 



f'.it) = fj{xk)+f"iv)it-xk) > r,{xk) - lit - Xk) 
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(13) 



where f'{xk) ■— minfc=i^„ fj{xk)- Using the lower bound (13) in (12) yields 



f{x) > f{xk) + Ej=o IxU) f{xk) - l{t - Xk) dt 
= f{xk) + n^k) - lit - Xk) dt. 



Finally by evaluating the integral on the right we deduce 



f{x) > f{xk) + f_{xk){x-Xk)--{x~Xk)'^ 



at all X > Xk- Similarly at all a: < Xfc we have 



f{x) > f{xk) + f'{xk){x-Xk)--{x-Xkf 




(14) 



about Xk satisfies f{x) > qk{x) for all a; e M. For eigenvalue functions generically the 
multiplicity of the eigenvalue at a global minimizer is one, meaning f{x) = fj{x) at all x close 
to a global minimizer for some j. In this case f{x) is analytic, and the quadratic model could 
be simplified as 



We assume the knowledge of an interval [x, x] in which the global minimizer is contained. 
A high-level description of the algorithm is given below. 

Description of the Algorithm 

1. Initially there are only two quadratic models go and qi about xq :— x and 
xi := X, respectively. 

2. Find the global minimizer of the piecewise quadratic function 



qk{x) = f{xk) + f'{xk){x - Xk) - -{x - Xk)^. 



q^{x) := max qk{x). 



k=0,s 



on [x, x] where s -I- 1 is the number of quadratic models. 



3 



The lower 
given by 



and upper bounds for the globally minimal value of f{x) are 



I — (7s(a;*) and u = min f{xk)- 



4. 



Let Xs+i '■— x^. Evaluate f{xs+i) and f'{xs+i), form the quadratic model 
qs+i about Xg+i, and increment s. 



5 



Repeat steps 2-4 until m — / is less than a prescribed tolerance. 
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Note that, for the optimization of Hermitian eigenvalues, evaluation of f{xs+i) corresponds 
to an eigenvalue computation of an n x n matrix. In this case once f{xs+i), that is the 
eigenvalue and the associated eigenvector, are evaluated, the derivative /'(xg+i) is cheap to 
calculate due to expression (|3]). The first five iterations of the algorithm applied to minimize 
Cmin — ujil) over the real line are illustrated in Figure [l] The red curve is a plot of the 
graph of f{uj) — cr,„in(^ — ^H), whereas the blue curves represent the quadratic models. The 
global minimizers of the piece-wise quadratic model are marked by asterisks. 



4 Multi-Dimensional Algorithm 

Now suppose /i, . . . , /„ : M'' M are functions analytic along each line in R'^, and differen- 
tiable on R''. The function / : M'' — )• M is a continuous and piece- wise function defined in 
terms of /i, . . . , /„. We would like to locate a global minimizer of /. For instance / could be 
considered as the jth largest eigenvalue of a Hermitian matrix function that varies analytically 
in M"*, as the unordered eigenvalues of such matrix functions obey the properties required for 
/!,...,/„ by Theorem [2;2l 

We first derive a quadratic model qk{x) about a given point Xk € R'^ such that qk{x) < f{x) 
for all X S M.'^. Let us consider the direction p :— {x — Xk)/\\x ~ Xk\\, the univariate function 
0(a) := f{xk + ap) and the analytic functions (l)j{a) fj{xk + ap) for j = I, . . . ,n defining 
(p. Also let us denote the finitely many points in the interval [0, ||a; — xi.\\] where (t>{a) is not 
differentiable by Then we have 

fix) = f{xu) + V / cj>'{t)dt (15) 

where a^^^ :— and q;^™^^'' := ||a; — Xk\\- Now due to the differentiability of fj we have 

c^){a)^Vfj{xk + apfp. (16) 
Furthermore since fj (xk + ap) is analytic with respect to a there exists a constant 7 satisfying 

|0"(a)|<7 VaeM (17) 
for j =^ 1, . . . , n. Now as in Section [3] we have 

(j)'(t) > minj=i,„(/i^ (0) - jt. 



By plugging the last inequality in then integrating the right-hand side of (151 we obtain 



f{x) > f{xk) + (minj=i ,„(/;>^(0)) \\x - Xk\\ - -^Ha; - Xk\\ 



Finally using ( 16 1 at a = and p -.^ (x — Xk)/\\x — Xk\\ yields 



f{x) > gfc(a;) := /(xfc) -f (niinj=i^„V/j(xfe)^(a; - Xfc)) - ^||a; - Xfelp. (18) 
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Figure 1: The first five iterations of tfie quadratic- model based algorithm for the minimization 
of piece-wise analytic functions 
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The algorithm in the multivariate case is same as the algorithm given in Section [3] for the 
univariate case, but with the definition of the quadratic model function generalized as in ( |18[ ) . 
As in the univariate case we assume that a box 

B :— B{xi,xi, . . . ,x^,Xd) '■— {x ^R'^ : Xj e [xj,Xj] for j = 1, . . . , c?} (19) 

containing a global minimizer is known, and the algorithm is performed inside this box. For 
convenience and for the convergence analysis in the next section the algorithm is formally 
presented below. 

Algorithm 1 Multidimensional Algorithm 

Require: A continuous and piece-wise function / : M'^ — > M that is defined in terms of 
differentiable functions /-,- for j = 1, . . . , n each of which is analytic along every line in R'^, 



9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 



a scalar 7 > satisfying (I6I, a set i3 given by (19 1, and a tolerance parameter e > 0. 
Pick an arbitrary xq E B. 
ui ^ f{xo); xhest ^ xq. 

■■= qo{x) := f{xo) +minj^i^„{V/j(a;o)^(a; - Xq)} - (7/2)||a; - a;o||^ 
xi i~ argmin^gB Qoix)- 

if f{xi) < ui then 

Ml ^ .f{xi); xbest 4- xi. 
end if 
s ^ 1. 

While Us — Is > e do 
loop 

qs{x) := f{xs) + mmj^i^n{Vfj{xk)'^{x - Xs)} - (7/2)||a; - XsW^. 
q^{x) := maxk=o,sUk(x)}- 
Xs+i ^ argmin^jge Qsix). 

Is+l ^ QsiXs+l)- 

if f{xs+i) < Us then 

Us+i f{xs+i); xbest ^ Xs+i- 
else 

Us+i ^ Us. 

end if 

s ^ s + 1. 
end loop 

Output: Is, Us, xbest. 



In the multivariate case finding a global minimizer of 

Qsix) = max qk{x) 

k—0,s 

is a tough task. For the sake of simplicity first let us suppose / is defined in terms of one 
function, that is / is indeed analytic. Then the quadratic model ( 18 1 simplifies as 

qk{x) := f{xk) + Vf{xkf{x - Xk) - ^W^ - a^fclP- (20) 
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Typically this is a reasonable assumption for the jth largest eigenvalue of a Hermitian matrix 
function in practice. Generically the jth largest eigenvalue is of multiplicity one at all points 
close to a global minimizer. 

To minimize qs{x) with the quadratic model ( |20[ ) we partition the box B into regions 
TZq, . . . ,Tis such that inside the region TZ^. the quadratic function gfc(x) takes the largest 
value (see Figure [2| . Therefore the minimization of (x) over the region TZk is equivalent to 
the minimization of qk^x) over the same region. This can be posed as the quadratic program 
below. 

minimize^gRd qkix) 



subject to qkix)>qi{x), £^k 

Xj (z [XjjXj], j 1 , 



(21) 



Note that the inequalities qk{x) > qi.{x) are linear, as qk{x) and qeix) have the same neg- 
ative curvature, and consequently the quadratic terms of qk{x) and qi(x) cancel out. The 
minimization of qgix) over B can be performed by solving the quadratic program above for 
k — 0, . . . , s. The difficulty with the problem above is due to the negative-definiteness of 
V'^qk{x). This makes the problem non-convex, but the solution is guaranteed to be attained 

s + d-l 



at one of the vertices of the feasible region. Clearly there can be at most j ^ 

vertices, where s -\- d — 1 is the number of constraints. In the 2-dimensional case the number 
of vertices can be proportional to O(s^) in theory, but in practice we observe that the number 
of vertices does not exceed 5-6. In theory solving an indefinite quadratic program is NP-hard. 
The problem above can be expected to be solved efficiently for small d only. We are able to 
solve it efficiently for c? = 2 at the moment. Most of the problems mentioned in the opening 
are either one or two dimensional. For instance the distance to uncontrollability and the 
distance to the nearest matrix with an eigenvalue of specified multiplicity (in particular the 
distance to the nearest defective matrix) are two dimensional. 

Let us comment on finding the global minimizer of qgix) to the full generality with the 



quadratic model (18). We use the notation 



qk,]{x) := f{xk) -f "^fj{xk) {x - Xk) - 2 11^ - ^felP 

so that qk{x) = miuj^i n qkjix). Then the box B could be partitioned into regions Ti-kj, k = 
0, . . . , s, j = 1, . . . ,n where 

• qk,j is not larger than qk.e for £ j, and 

• qk.j is not smaller than at least one of qp^i, £ — 1, . . . ,n for p ^ k. 

Minimization of qg{x) over B can be achieved by minimizing qk,j{x) over TZk.j for k = 1, . . . , s 
and j = 1, . . . ,n. The latter problem could be posed as a quadratic program of the form 

minimize^gH'i qk,]{x) 

subject to qk,]{x) < qk,i{x), i^j . (22) 

V£=i,„ qk,j{x) > qpxix), k 
Xj € [xj ,Xj], j — 1 , . . . , d 
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Figure 2: To minimize the piece-wise quadratic function q{x) over the box B the box is spht 
into regions TZk, k = 1, . . . ,s such that qkix) is the largest inside the region TZk- Above a 
possible partitioning is illustrated in the 2-dimensional case. 



Above the notation \/e=i^n<lk,jix) > Qp,iix) means that at least one of the constraints qk.j{x) > 
qp i {x) for ^ — 1, . . . , n must be satisfied. Clearly this is even a more difficult problem than 
(21). In practice it suffices to solve (21) instead of (22), because the multiplicity of the jth 
largest eigenvalue generically remains one close to a global minimizer. 



5 Convergence Analysis of Multi-Dimensional Algorithm 

In this section we analyze the convergence of Algorithm [T] for the following optimization 
problem. 

(P) r:=min/(x) 

Recall that the algorithm starts off by picking an arbitrary point xq G B. At iteration s, the 
algorithm picks x^+i to be a global minimizer of qs{x) over where q^ix) is the maximum 
of the functions qk{x) constructed at the points x^, k — 0, . . . , s in ;B. Note that {Ig} is a 
non-decreasing sequence of lower bounds on /*, while {us\ is a non-increasing sequence of 
upper bounds on /*. 

We require that / : M"^ — > M is a continuous and piece- wise function defined in terms of the 
differentiable functions fj : K'' — >■ K, j = 1, . . . ,rt. Differentiability of each fj on implies 
the boundedness of ||V/j(x)|| on W^. Consequently, we define 

/i :— max max ||V/j(x)||. 

j = l,n x^B 

We furthermore require each piece fj to be analytic along every line in M'', which implies the 



existence of a scalar 7 > satisfying (17). Our convergence analysis depends on the scalars 
fi and 7. We now establish the convergence of Algorithm [l] to a global minimizer of (P). 



Optimization of Hermitian Eigenvalues 



17 



Theorem 5.1. Let {xs} be the sequence of iterates generated by Algorithm^ Every limit 
point of this sequence is a global minimizer of the problem (P). 

Proof. Since S is a compact set, it follows that the sequence {xs} has at least one limit 
point X* G B. By passing to a subsequence if necessary, we may assume that {xs} itself is 
a convergent sequence. Let I* denote the limit of the bounded nondecreasing sequence {ls\- 
Since Is < f* < f{xs) for each s > 0, it suffices to show that /* = f{x*). 

Suppose, for a contradiction, that there exists a real number 6 > such that 

f{x*)>l*+6. (23) 

By the continuity of /, there exists si G N such that 

f{xs)>l* + ^, foralKs>si. (24) 
Since x* is the limit of the sequence {xs}, there exists S2 G N such that 

\\xs' - Xs^W < min \ \ i , for all s' > s" > S2, (25) 




where we define 1/fi := +cxi if /x = 0. Let — max{si, S2}. For each s > s*, it follows from 
the definition of the functions (jgix) that 

qs{xs+i) > qs,{^s+i), 
> qs,{xs+i), 

= fi^sj + \7fj,{Xsj'^{Xs+l - Xs,)~ '^\\Xs+l - Xs.lp. 

where € {1, . . . , n} is the index of the gradient that determines the value of qs, (xs+i). Now 
by applying the Cauchy-Schwarz inequality and then using the inequalities ( 24 ) and ( 25 1 we 
arrive at 

QsiXs+l) > f{Xs,)-\\'^fjAx*)\\\\Xs+l-XsJ-'^\\Xs+l-Xsj'^, 

- ^ 2) " (^'12;^) " (2 ■ 6^ 

^3' 

Using the definition Is+i — ^^(xs+i), it follows that 

Is+i > I* + 1:, for ah s > s*. 

Since S > 0, this contradicts our assumption that I* is the limit of the non-decreasing sequence 
{Is}. Therefore, we have f{x*) < 1* + S for all 5 > 0, or equivalently f{x*) < I*. Since 
Is < f{x) for all s € N and x E B, it follows that I* < f{x*), which establishes that 
f{x*) — I* < f{x) for all X E B. Therefore, x* is a global minimizer of (P). The assertion is 
proved by repeating the same argument for any other limit point of the sequence {xk}. □ 
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6 Practical Issues 

The distance to uncontroUability from a time-invariant linear control system (A^ B) (see Sec- 



tion 7.3 for the formal definition of the distance to uncontroUability) where A e C"^" and 
B G C"^"' with n > m has the eigenvalue optimization characterization 

inf a„([ A-zI B ]). 
zee ^' 

Below we apply Algorithm[T]to calculate the distance to uncontroUability for a pair of random 
matrices (of size 100 x 100 and 100 x 30, respectively) whose entries are selected from a normal 
distribution with zero mean and unit variance. The running times (in seconds) in Matlab for 
every thirty iterations are given in the table below. 



iterations 


1-30 31-60 61-90 91-120 121-150 


cpu-time 


3.283 6.514 10.372 17.069 28.686 



Without doubt the later iterations are expensive. At iteration s there are s quadratic programs 



of the form (21 ). Each of the s quadratic programs has s — 1 -I- c? constraints. When we add 
the sth quadratic model, we create a new quadratic program for which we need to calculate 

its potentially [ * o ^ ) vertices. (Many of these potential vertices would turn out to 



2 

be infeasible in practice; but this is difficult to know in advance.) Furthermore each of the 
existing s — 1 quadratic programs has now one more constraint, consequently each of the 
existing s — 1 quadratic programs has potentially s — 2 -I- d additional vertices. We need to 
calculate these vertices as well. The total number of vertices that need to be calculated at 
iteration s is O(s^). Calculation of these vertices dominates the computation time eventually 
even for a system with large system matrices {A,B\ This is a rare situation for which the 
computation time is dominated by the solution of 2 x 2 linear systems! Obviously the situation 
does not get any better in higher dimensions. In the d dimensional case the work at the sth 
iteration would be 0{s'^^. 

Consider the following two schemes both of which use s quadratic functions in the 2- 
dimcnsional case. 

(1) Form a piece- wise quadratic model consisting of s quadratic functions on a x I2 box. 

(2) Split the l\ X I2 box into four sub-boxes, each of size ^ x ^. Then use a piece- wise 
quadratic model consisting of s/4 quadratic functions inside each sub-box. 

The latter scheme is 16 times cheaper. Furthermore our quadratic models capture the eigen- 
value function much better in a small box, since our quadratic models are defined in terms of 
derivatives, that is the local information. 

It appears wiser not to let s grow too large. But we also would like s to be not too small so 
that the piece- wise quadratic model can capture the eigenvalue function to a degree. Once the 
cost of adding a new quadratic function becomes expensive, we can split a box into sub-boxes, 
and construct piece-wise quadratic functions inside each sub-box from scratch separately. We 
start with 2'^ sub-boxes of equal size. Apply Algorithm [l] on each sub-box until either the 
prescribed accuracy e is reached, or the number of quadratic functions exceeds a prescribed 
value Uq. Then we have a global upper bound resulting from the function values evaluated so 
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far on various sub-boxes. We further partition each sub-box into sub-boxes if the lower bound 
for the sub-box (resulting from the piece-wise quadratic model particularly constructed for 
the sub-box) is less than the global upper bound minus the prescribed tolerance e. In practice 
we observe that many of the sub-boxes do not need to be further partitioned, because their 
lower bounds are larger than the global upper bound minus e. The practical algorithm is 
summarized below. 



Algorithm 2 Mesh- Adaptive Multidimensional Algorithm 



Require: A continuous and piece-wise function / : M — > M that is defined in terms of 
differentiable functions /j for j = 1, . . . , n each of which is analytic along every line in M.'^, 
a scalar 7 > satisfying (16), a set ;B given by (19), a tolerance parameter e > 0, and a 
parameter Uq for the number of quadratic functions. 
Partition B into boxes Bi, . . . , of equal size, 
u 00 

For j = 1,...,2'' do 
loop 

Apply Algorithm [1] on Bj but with the constraint that s does not exceed Ug. Let Ij and 
Uj be the returned lower, upper bounds. Let xj be the returned xbest. 
if Uj < u then 

u ^ Uj] a; Xj 
end if 
end loop 
I -fr- mhij^i 2d{lj} 
if w — / > e then 

Sort the boxes ,Bi , . . . , B2d according to their upper bounds ui , . . . , from smallest 
to largest. Sort also the lower and upper bounds of the boxes accordingly. 
For j = 1, ... ,2"* do 
loop 

if [u — tj) > e then 

Apply Algorithm [2] on Bj. Let Ij and Uj be the returned lower, upper bounds. Let 
Xj be the returned xbest. 
if Uj < u then 

U Uj] X Xj 

end if 
end if 
end loop 

£ -s— n\inj^i 2d{lj} 
end if 

Output: l,u,x. 



Sorting the boxes according to their upper-bounds makes the algorithm greedy. When we 
further partition, we start with the box yielding the smallest function value so far, continue 
with the one yielding the second smallest function value and so on. There are a few further 
practical improvements that could be made. 

• When Algorithm [T] is called inside Algorithm [2] it is possible to benefit from the global 
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upper bound u. Algorithm [T] could terminate once mm{u,Us} ~ Ig < e instead of 
Us - Is < e- 

• Furthermore, in Algorithm [T] for each of the quadratic program if the optimal value 
exceeds u — e, then the quadratic program could be discarded at the later iterations. 
The reasoning is simple; the eigenvalue function over the feasible region of the quadratic 
program is verified to be no smaller than u — e. If the eigenvalue function really takes a 
value smaller than u — e, it will be in some other region. If the eigenvalue function does 
not take a value smaller than u — e, then there is no reason to further search anyway; 
eventually the lower bounds from other regions will also approach or exceed u — e. 

For the example mentioned at the opening of this section Algorithm [T] calculates the 
distance to uncontroUability as 0.785, and guarantees that the exact value can differ from 
this computed value by at most 0.23 after 150 iterations and 66 seconds. Algorithm [2] on the 
other hand with Uq = 30 requires 55, 121, 171 and 214 seconds to calculate the distance to 
uncontroUability within an error of 10^^, 10~^, 10~® and 10^^, respectively. Just to guarantee 
10^^ accuracy Algorithm [l] iterates 305 times and uses 1298 seconds of cpu time. 



7 Applications to Eigenvalue Optimization 



We reserve this section for the applications of the algorithm introduced and analyzed in 
the previous four sections to particular eigenvalue optimization problems. For each problem 
we deduce expressions for derivatives in terms of eigenvectors using (31 and (111. It may 
be possible to deduce the bound 7 for the second derivatives using (IS I, though we will not 
attempt it here. 



7.1 Numerical Radius 

The numerical radius r{A) of a square matrix A e jg modulus of the outer-most 

point in its field of values [20] 

F{A) = {z*Az e C : z e C" s.t. ||z||2 = 1}. 

The numerical radius gives information about the powers of A, e.g., WA'^W < 2r{A)''. In 
literature it is used to analyze the convergence of iterative methods for the solution of linear 
systems [TTIIB]. 

The facts that the right-most intersection point of F{A) with the real axis is given by 



Ai 



A + A* 
2 



and F{Ae'^^) is same as F{A) rotated 9 radians in the counter clock- wise direction together 
imply the eigenvalue optimization characterization 



r(A) = max Ai (A(e)) 

eG[0,2ir] 



where A{9) (Ae*^ + A*e-''^)/2. 
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Clearly A{0) is analytic and Hermitian at all 6. If the multiplicity of Ai {A{0)) is one, 
then it is equal to one of the unordered eigenvalues in Theorem |2.1| in a neighborhood of 6. 
In this case Xi{9) := Ai (.4(6')) is analytic, and its derivative can be deduced as 

= vliO) Real {iAe'^) vi{0). (26) 

from ([3]) where vi{9) is the analytic unit eigenvector associated with Xi{6). The bound 7 for 
the second derivative depends on the norm of A; it is larger for A with larger norm. 
Here we specifically illustrate the algorithm on matrices 

^„ = Pn - (n/20) • iRn 

of various sizes where P„ is an n x n matrix obtained from a finite difference discretization of 
the Poisson operator, and i?„ is a random n x n matrix with entries selected from a normal 
distribution with zero mean and unit variance. This is a carefully chosen challenging example, 
as Xi{9) has many local maxima. The largest four eigenvalues of A{9) are displayed in Figure 
[3] on the left when n = 100. The number of local maxima typically increases as the size 
of the input matrix increases. Note that the largest four eigenvalues do not intersect each 
other, consequently all of them are analytic for this particular example. The plot of the 
second derivative is also given in Figure [3] on the right. For the particular example the second 
derivative varies in between -49 and 134, and ||Aioo|| = 241. 

The number of function evaluations by our one-dimensional algorithm applied to calculate 
r{An) for n = 100,400,900 are summarized in Table [T] The cpu-times in seconds are also 
provided in Table [l] in paranthesis. We set 7 = ||^n|| for each An, even though this is a gross 
over-estimate. Smaller 7 values obviously require fewer function evaluations. For instance for 
the last line in Tablejljwe choose 7 = HAgooH — 2689, whereas in reality the second derivative 
never drops below —300. Thus choosing 7 = 300 yields exactly the same numerical radius 
value for e = 10^^^ but after 41 function evaluations and 88 seconds of cpu-time (instead of 
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n / e 


10^4 


io-« 


io-« 


10-10 


10-^^ 


100 


45 (1.0) 


54 (1.2) 


64 (1.4) 


73 (1.6) 


81 (1.9) 


400 


44 (9.0) 


54 (10.9) 


65 (12.9) 


74 (14.6) 


83 (17.5) 


900 


67 (156) 


77 (177) 


88 (201) 


99 (225) 


119 (267) 



Table 1: Number of function evalutions (or iterations) and cpu-times in seconds (in paran- 
thesis) of the one-dimensional algorithm on the Poisson-random matrices A„ of various sizes 



e 


10-^ 


10-^ 


10"^ 


10-4 


# of func. eval. 


573 


1427 


4013 


11591 



Table 2: Number of function evaluations (or iterations) for the Piyavskii-Shubert algorithm 
on the matrix Aiqq for various accuracy 



119 function evalutions and 267 seconds). But it may be difficult to know such a value of 7 
in advance. 

The notable thing from Table [T] is that the asymptotic rate of convergence appears to 
be linear, i.e., every two decimal digit accuracy requires about ten additional iterations. For 
instance the number of iterations required by the Piyavskii-Shubert algorithm applied to 
calculate r(A„) for n — 100 to reach e — 10^^, 10^^, lO^"^, lO-** accuracy are listed in Table 
[2j The Piyavskii-Shubert algorithm requires a global Lipschitz constant for the eigenvalue 
function. Here we choose it as 7 = ||Aioo||, i.e., the expression (26) implies that in the 
worst case the derivative of the eigenvalue function can be as large as ||Aioo||. Clearly Table 
[2] indicates sub-linear convergence for the Piyavskii-Shubert algorithm. Significantly more 
iterations are required to reach 10~^ accuracy from 10^^ as compared to the number of 
iterations required to reach 10~^ accuracy from 10^^ accuracy. For the optimization of a 
Lipschitz continuous function with Lipschitz constant 7 the first thing that comes to mind is a 
brute- force grid search. The idea is to split an interval of length £ containing a global niinimizer 
or global maximizer into equal sub-intervals each of length ^(7/(2e)). Then evaluating the 
function at all grid-point and taking the smallest would guarantee that error cannot exceed e. 
For AiQQ this naive idea would require more than 10^^ function evaluations with e — 10^^^. 

The algorithm described in [27] for the numerical radius is one of the most reliable tech- 
niques at a reasonable cost at the moment. It is not based on derivatives, rather it is based 
on finding the level sets of Ai (9). The results for the numerical radius of An listed in the 
last column of Table [T] match with the algorithm in [^Tj up to 12 decimal digits. But the 
specialized algorithm in [27^ appears to be slower as indicated by Table [3j 



n 


100 


400 


900 


cpu-time 


1.9 


48 


603 



Table 3: cpu-times in second for the algorithm in [27 on An for various n 
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7.2 Hoc norm 

One of the two most widely-used norms in practice for the time-invariant linear control system 

x'{t)^ Ax{t)+Bu{t) 
y{t) = Cx{t) + Du{t) 

is the norm (with the other common norm being the H2 norm). Above u{t) is called the 
control input, y{t) is called the output, and A e C"^", B G €"^"",(7 € Cp^",L» € 
with rrijp < n are the system matrices. We say that the system above (more precisely the 
state-space description of the system) is of order n. In the Laplace domain this system with 
x{0) — can be represented as 

Y{s) = H{s)U{s) 

where U{s),Y{s) denote the Laplace transformations of u{t),y{t), respectively, and H{s) :— 
(C(s)(s/ — A)^^B{s) + D{s)) is called the transfer function of the system. The iJoo norm of 
the transfer function is defined as 

sup CTi {H{is)) , 

and is same as the oo norm of the operator that maps u{t) to y{t) in the time domain. For 
instance in Hao model reduction |10l the purpose is to find a smaller system of order r 
such that the operator of the reduced-order system is as close to the operator of the original 
systems as possible with respect to the 00-norm. 

The Hoo norm is well-defined only when A is stable, i.e., all of its eigenvalues lie on the left 
half of the complex plane. In this case the matrix function A{s) H{is) is analytic over the 
real line. Whenever ai(s) := cti {H{is)) is of multiplicity one and non-zero, the singular value 
(Ti(s) matches with one of the unordered and unsigned analytic singular values discussed in 
Section [273] in a small neighborhood of s. Then its derivative is given by 

^ = Imag (ui(s)* C{sil ~ A)-^B wi(s)) 
as 



from expression (11) where ui{s),vi{s) is a consistent pair of unit left and right singular 
vectors associated with ai{s). 

We experiment with the system (j4„. i?„, C„, £)„) of order n for various values of n resulting 
from a finite difference discretization of the heat equation with a control input and a control 
output [22, Example 3.2]. We slightly perturb An in each case so that the optimization 
problem becomes more challenging. In Figure |4] the function (Ti(s) is displayed together with 
(s*, (Ti(s*)) marked by an asterisk where s* is the computed global maximizer of (Ti(s). Figure 
[sjdisplays the second derivative of cri(s), which seems to lie in the interval [—11,3]. 

The number of iterations and cpu-times required by the one-dimensional algorithm for a 
perturbed variant of the control system (A„, i?„, C„, £*„) resulting from the heat equation of 
order n = 100,200,400,800 are listed in Table [ij Here we set 7 = \\A-'^\\ = Once 
again the algorithm appears to be converging linearly, i.e., every two decimal digit accuracy 
requires about a fixed number additional function evalutations. Piyavskii-Shubert algorithm 
again converges sub-linearly; for instance for the system of order n = 100 the number of 
function evaluations necessary to reach e = 10^^, 10^"*, 10^^ accuracy are 71, 635, 5665, 
respectively. 
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2 - 



1.5 - 




Figure 4: The plot of ai{s) :— ai{H{is)) for the dynamical system resulting from the Heat 
equation together with (s*,cri(s*)) marked by an asterisk where is the computed global 
maximizer 

4p 

2 - 

- 
-2 - 
-4 - 
-6 - 
-8 - 
-10 - 



■~ -6 -4 -2 2 4 6 

Figure 5: The plot of the second derivative of 0-1(5) :— ai{H{is)) for the dynamical system 
resulting from the Heat equation 



n / e 


10-4 


10"^ 


10-8 


10-10 


100 


23 (0.3) 


32 (0.5) 


39 (0.5) 


47 (0.6) 


200 


22 (1.5) 


29 (1.9) 


36 (2.3) 


44 (2.8) 


400 


18 (8.3) 


24 (10.8) 


29 (12.9) 


34 (17.6) 


800 


16 (53) 


19 (63) 


22 (73) 


27 (92) 



Table 4: Number of function evalutions (or iterations) and cpu-times in seconds (in paran- 
thesis) of the one-dimensional algorithm for calculating the Hoo norm of the control systems 
resulting from the heat equation 
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7.3 Distance to Uncontrollability 

The controllability of a time-invariant linear control system means that the system can be 
driven into any state at a particular time by some input u{t). This property solely depends 
on the differential part 

x'{t) = Ax{t) + Bu{t) 

of the state-space description from the previous subsection, and could be equivalently char- 
acterized as 

rank([ B AB A^B ... A''-^B])=n 

or 

rank {[ A -zl B ]) = n Vz G C. 

The controllability is a fundamental property just as stability. For instance if a system is 
not controllable, then it is not minimal in the sense that there are systems of smaller order 
mapping the input to the output in exactly the same manner as the original system. 
Paige [55] suggested the distance to uncontrollability 

inf{||[ AB]\\^: x' (t) ^ {A + AA)x{t) + {B + AB)u{t)} 

as a robust measure of controllability. This problem has the eigenvalue optimization charac- 
terization [12] 

mina„([ A~zl B ]) . 

As in the previous subsection, since the matrix function A{z) := [ A — zl _B ] is analytic, 
we conclude that cr„ (a;i,a;2) := (Xn (-^(a; + 1^2)) is differentiable and analytic along every line 

in whenever it is non-zero and is of multiplicity one. Let u,i(a;) G C", Vn{uj) ~ ^"('^l 

£"n+m ^[ii^ Vn{uj) G C",'0„(a;) G C™ be a consistent pair of unit left and right singular vectors 



associated with tT„(w). Then by (11) the gradient is given by 



Vct„(w) = (-Real(u*(w)w„(a;)) , Imag (u* (w){;„(a;))) 

The level sets of the function crn(a;) are shown in Figure[6]for the perturbed control system 
resulting from the heat equation of the previous subsection of order n — 30. Clearly the 
function is highly non-convex with multiple local minima. We apply Algorithm [2] to calculate 
the distance to uncontrollability for the heat equation examples of the previous subsection. 
The number of function evaluations and cpu-times in seconds are listed for the systems of 
order n = 100, 200, 400 in Table [Sj In all cases we set 7 = 2. 

The rate of convergence is not a very meaningful criterion for Algorithm [2] since it is 
mesh-based. But we again observe that every two decimal digit accuracy does not increase 
the number of function evaluations significantly. A brute-force grid based method for a 
Lipschitz function with Lipschitz constant 7 on a rectangle of size £1 x £2 would require 
{£1 ■ 7) X {£2 • 7)/(2 • e^) function evaluations for e accuracy. For the heat example and with 
tolerance e = 10~* a brute-force grid approach would amount to more than 10^® function 
evaluations. None of the existing algorithms that we are aware of, such as [S] (TSl [111 [TTj , is 
capable of solving a 400 x 400 example to half of the precision at a reasonable time. 
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Figure 6: The level-sets of the function an{uJi,0J2) '■= o-n{A{uji + iuj2)) for the distance to 
uncontrollability from the perturbed control system of order n = 30 resulting from the heat 
equation are displayed. 



n / e 


10-^ 


10-4 


10-*^ 


io-« 


100 


345 (38) 


548 (56) 


747 (73) 


850 (82) 


200 


456 (53) 


569 (65) 


767 (84) 


1066 (113) 


400 


615 (315) 


734 (374) 


849 (427) 


1047 (521) 



Table 5: Number of function evalutions (or iterations) and cpu-times in seconds (in paranthe- 
sis) of Algorithm [2] for calculating the distances to uncontrollability from the control systems 
resulting from the heat equation 
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7.4 Distance to Defectiveness 



Distance to a nearest defective matrix from a square matrix A E C"^" 

inf {IIAAII2 : AA G C"""" s.t. {A + AA) is defective} 

is mentioned in the book of |36j as a possible measure of the sensitivity of the worst-conditioned 
eigenvalue of A. Later it is confirmed [3TJ |37] that indeed the distance to defectiveness from 
A is large if and only if A has a highly sensitive eigenvalue. For this distance Malyshev |25] 
deduced the eigenvalue optimization characterization 



mm max a2r, 

AGC7G[0,1] 



A -XI -fl 
A~XI 



Unlike the problems in the previous subsections the eigenvalue characterization is in the 
min-max form. But Algorithm [2] is still applicable. The function that we need to minimize is 
defined in terms of the functions 



/,(Ai,A2) 



for j = 1, . . . ,4n where 



^(Ai,A2,7) 



max A,(yl(Ai, A2,7)) 
7e[o,i] 



S(Ai,A2,7) 
S*(Ai,A2,7) 



(27) 



with 



S(Ai,A2,7) 



A~{Xi+iX2)I -fl 

A-{Xi + iX2)I 

It follows from the derivation of the eigenvalue optimization characterization (see [3S]) that 
the maximization problem in (27) for all fj{X) such that |/j(A)| < CT2n-i(A) are unimodal 
where 



0'2n-l(Al, A2) 



max o-2n-i 

76 [0,1] 



A-{Xi+iX2)I -fl 

A-{Xi+iX2)I 



This means that all such fj{X) are differentiable and analytic along every line in M^. These 
are the pieces defining the function (J2n-i at A. By continuity they remain to be the defining 
functions for a2n-i in a neighborhood of A. Perform an analytic extension to fj if necessary 
outside of this neighborhood. At other A outside the neighborhood defining functions may 
be different. But there are finitely many such neighborhoods on a bounded domain. Con- 
sequently, cr2„_i(A) is defined by finitely many functions each of which is differentiable and 
analytic along every line in W^. 

In practice the defining functions remain the same inside the box for all X E B. Indeed it 
seems reasonable to assume that CT2ri-i is defined only by one function fj{X) inside B excluding 
the non-generic cases. Then the function i72Tt-i (A) is differentiable and analytic along lines 



in M? . Suppose that the maximization problem ( 27 ) is attained at 7* . From [3] we have 



Va2„_i(A) = V/j(A) 



aA,-(^(A,7,)) dX,{A{X,^,)) 
dXi ' 8X2 
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Figure 7: (Left) the level-sets of the function i72n-i(Ai, A2) that needs to be minimized 
for the distance to defectiveness for a 5 x 5 penta-diagonal Toeplitz matrix; (Right) the e 
pseudospectrum of the same matrix for e equal to its distance to defectiveness together with 
asterisks marking the defective eigenvalue of the nearest matrix 



It follows from (11) that 



Vf72n-i(A) = (-Real(u;„„;^(A)u2„-i(A)), Imag(u;„_;^(A)u2„_i(A))) 



where U2ri-i(A), f2ji-i(A) is a consistent pair of unit left and right singular vectors associated 
with 

" A-XI 7,/ 
A-\I 



0'2n-l(A) :— 0'2ri-l 



The level sets of the function cr2ri-i(A) is displayed for the 5x5 matrix T = diag(l, —10, 0, 10, 1) 
on the left in Figure [7] which reveals the non-convex nature of the function to be minimized. 
It is well-known that the distance from A to the nearest defective matrix is related to the 
e-pseudospectrum of A defined as 



UA) (J AiX), 

\\X-A\\2<<i 



where A(-) denotes the spectrum of its argument. For an n x n matrix with distinct eigenvalues 
this set consists of n disconnected components, one component around each eigenvalue. The 
distance from A to the nearest defective matrix is the smallest e such that two components 
of Ae (A) coalesce [U [7] . Furthermore the point of coalescence is the defective eigenvalue of 
the nearest defective matrix. For the 5x5 example T the e-pseudospectrum is displayed on 
the right in Figure [t] for various e. The outer- most curves represent the boundary of Ae(r) 
for e = 3.753, the computed distance to defectiveness by Algorithm [2] Two components of 
the outer-most curve coalesce at A* — —0.336 — il3.6 marked by an asterisks, which is the 
computed defective eigenvalue of the nearest matrix. 
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The inner minimization problems are solved by means of the secant method, which requires 
the derivatives with respect to 7 for a fixed A. Analytic expressions can again be derived from 
(111 for these derivatives. The distance to the nearest defective matrix is same as the distance 
to the nearest matrix with a multiple eigenvalue. The reason is that for any matrix A with 
a multiple eigenvalue, there are defective matrices arbitrarily close to A. In [35] eigenvalue 
optimization characterizations were derived for the more general problem, the distance to the 
nearest matrix with an eigenvalue of specified algebraic multiplicity. The discussions in this 
subsection extends for the numerical solution of these more general distances as well. 



8 Software 

Matlab implementations of the one-dimensional algorithm and the 2-dimensional version of 
Algorithm [2] are available on the wetQ The user of the routines is expected write down a 
matlab routine calculating the eigenvalue function as well as its derivative, or gradient in the 
2-dimensional case, at a given point. The user must also provide 7, an upper bound on the 
second derivatives in absolute value. 



9 Conclusion 

We introduced the first generic algorithm tailored for the optimization of the eigenvalues 
of a Hermitian matrix function depending on its parameters analytically. The algorithm 
is guaranteed to converge to a global optimizer. In practice we observe linear convergence 
unlike other global optimization algorithms exploiting the Lipschitzness or boundedness of 
the function and converging sub-linearly. This is largely due to the fact that the algorithm 
makes use of the derivatives, and constructs piece-wise quadratic models. 

The computational difficulty with the algorithm is that in the multi-dimensional case 
negative definite quadratic functions need to be minimized subject to linear constraints. This 
problem is NP-hard, but the solution is guaranteed to be attained at one of the vertices of its 
feasible region. In small dimensions these quadratic problems can be solved efficiently; but 
in high dimensions they are not tractable. A Matlab implementation of the 2-dimensional 
algorithm is available. We plan to extend the implementation for up to 4 or 5-dimensional 
case. But for higher dimensional problems the negative definite quadratic programs need to 
be replaced by positive definite programs. This is a subject that we hope to tackle in the near 
future. 
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